Speechfind: an experimental on-line spoken document retrieval system for historical audio archives

نویسندگان

  • Bowen Zhou
  • John H. L. Hansen
چکیده

In this study, we present the SpeechFind system, an experimental on-line spoken document retrieval system for historical audio archives. As part of an on-going U.S. NSF Digital Library Initiative project, entitled the National Gallery of the Spoken Word (NGSW), SpeechFind is intended to serve as an audio index and search engine for spoken word collections spanning the 20th century with as much as 60,000 hours of audio archives. In this paper, we describe the system architecture of SpeechFind, with focus on audio data transcription and information retrieval components. Using a sample test audio data collection from the past 60 years, an evaluation of individual system components and overall performance is presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advances in speechfind: transcript reliability estimation employing confidence measure based on discriminative sub-word model for SDR

This study presents our recent advances in our spoken document retrieval (SDR) system SpeechFind including our partnership with the Collaborative Digitization Program (CDP). A proto-type of SpeechFind for the CDP is currently serving as the search engine for 1,300 hours of the CDP audio content. These audio corpus of spoken document possess a wide range of conditions which make speech recogniti...

متن کامل

A robust fusion method for multilingual spoken document retrieval systems employing tiered resources

In this study, we present two novel fusion approaches to merge subword and word based retrieval methods within a multilingual spoken document retrieval (SDR) system. Considering the fact that more than 6000 languages are spoken in the world today, resources (e.g., text and audio data, pronunciation lexicon) needed to develop Automatic Speech Recognition (ASR) systems for such a range of languag...

متن کامل

An experimental study of an audio indexing system for the web

We have developed a speech recognition based audio search engine for indexing spoken documents found on the World Wide Web. Our site (http://www.compaq.com/speechbot) indexes around 20 news and talk radio shows covering a wide range of topics, speaking styles and acoustic conditions from a selection of public Web sites with multimedia archives. In this paper, we describe our system and its perf...

متن کامل

Spoken Document Retrieval for the Languages of Hong Kong

The advent of the information age has brought massive digital libraries of multimedia and multilingual content. This creates a high demand for multimedia and multilingual indexing and retrieval technologies, e.g. those applicable to audio archives. This paper reports on our development of spoken document retrieval (SDR) technologies, where speech recognition is combined with information retriev...

متن کامل

Syllable-based Language Models in Speech Recognition for English Spoken Document Retrieval

The spoken content of audio/visual collections such as TV or radio archives is an information resource of enormous potential. The challenge is to develop methods that will make it possible to browse or search these collections. The experimental results presented in this paper demonstrate that syllable-level transcripts provide an important supplement to conventional word-level transcripts for t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002